Entry Name:  "HKUST-Qiao-MC2"

VAST Challenge 2017
Mini-Challenge 2

 

 

Team Members:

Qiao GU, The Hong Kong University of Science and Technology, qgu@connect.ust.hk     PRIMARY

Hang YIN, The Hong Kong University of Science and Technology, hyinac@connect.ust.hk

Lian CHEN, The Hong Kong University of Science and Technology, lchenbk@connect.ust.hk

Chengzhong LIU, The Hong Kong University of Science and Technology, cliubf@connect.ust.hk

Haotian LI, The Hong Kong University of Science and Technology, hlibg@connect.ust.hk

Xuanwu YUE, The Hong Kong University of Science and Technology, xuanwu.yue@gmail.com

Huamin QU, The Hong Kong University of Science and Technology, huamin@cse.ust.hk

Student Team:  YES

 

Tools Used:

Python scripts with matplotlib library, written by the student team for the challenge.

PreserVis, a HTML+CSS+JS system with Vue.js, d3.js, ECharts library, developed by the student team for the challenge.

Tableau

 

Approximately how many hours were spent working on this submission in total?

20 days * 4 hours/day = 80 hours

 

May we post your submission in the Visual Analytics Benchmark Repository after VAST Challenge 2017 is complete? YES

 

Video

https://youtu.be/2MBZLHq4Hvo  

 

 

Questions

MC2.1 – Characterize the sensors’ performance and operation.  Are they all working properly at all times?  Can you detect any unexpected behaviors of the sensors through analyzing the readings they capture? Limit your response to no more than 9 images and 1000 words.

The unexpected behaviors of sensors are organized into patterns and listed below:

 

Pattern 1: Data missing

At zero clock at some dates, majority or all the readings of all sensors were missing regardless of kinds of chemicals. Specifically, they were:

2016.04.02 00:00 All missing.

2016.04.06 00:00 All missing.

2016.08.02 00:00 Only sensor 3’s reading of Methylosmolene and AGOC-3A were left.

2016.08.04 00:00 All missing.

2016.08.07 00:00 All missing.

2016.12.02 00:00 All missing.

2016.12.07 00:00 Only the readings about AGOC-3A of sensor 6, sensor 7 and senor8, readings about Appluimonia of sensor 7 and reading about Methylosmolene of sensor 8 were left.

All missing occurred in the start of each month.

 

Pattern 2: volatile organic solvents’ conflicts

             At certain times, one sensor had 2 readings of AGOC-3A and the reading of Methylosmolene of the same sensor at the same time would be missing. The bigger one of the repeated readings of AGOC-3A was approximately 10 times of the smaller one.

             We analyzed the occurrence pattern of this behavior, and no obvious pattern of time was found (except that this happened more at daytime, and never occured from 22:00 to 5:00 of next day.)

             But when we combined these records with the wind direction at the time, we found that this behavior happened mostly when the sensor received the emission from certain company (Kasios Office Furniture). The following figure shows the number of occurrences in each direction of all sensors, with each circle representing the corresponding sensor, and the sensors and companies are at their relative positions.

Figure: Orientation Distribution of Pattern 2 Error

 

             Because AGOC-3A and Methylosmolene are both kinds of volatile organic solvents and even AGOC-3A is a substitution of Methylosmolene, we suspect that these two chemicals may have reaction which may influence the sensors to have abnormal reading, or the sensors has no enough ability to distinguish one from another and thus leave unusual readings when one or both of them are at high level.

             In conclusion, these records are not purely errors or occurs randomly. Instead, they reflect the (high) reading of AGOC-3A at the time at some degree. So, to handle these readings for the following analysis, the smaller reading of AOGC-3A are kept and the larger one is ignored because its reading will dominate to much of the analysis after wards.

 

 

Pattern 3: Sudden change of Methylosmolene release

             Compared with other chemicals, the reading of Methylosmolene of all sensors experienced shaper changes, and even some sudden single-point jumps.

             The following figure shows the histogram of first derivative of different chemicals of all sensors.

 

sensor

Figure: Distribution of Derivative of all sensors of all chemicals.

 

            

             This shows that readings of all sensors have more huge jumps on chemical Methylosmolene, which infers that all sensors are apparently more sensitive to Methylosmolene. To find the pattern of such sudden changes, we draw the similar plot as that in Pattern 2 (draw the times of sudden changes according to the wind direction at that time):

             Figure: Orientation Distribution of Pattern 3 Error

 

             From the figure, we can see that, just like the Pattern 2, these sudden changes did not occur randomly. Although the readings may not be as normal as others, they still reflected the high level of the release of the Methylosmolene from Company of Kasios or Roadruuner.

             We suspect that sensors’ reading of the Methylosmolene were not linear. Because lack of further information, we decide to keep these readings unchanged for the following analysis.

 

Pattern 4: Linearly-increasing reading of sensor 4

             The minimum reading of each sensor of each chemical of each month were very close to zero, except those of sensor 4. Sensor 4 had increasing minimum reading of all chemical at each month.

             After further analysis, we found that sensor had a linearly-increasing offset in its readings. As the figure below:

Figure: Line Chart of Sensor 4 of all chemicals versus time.

 

             According to the figure, we suspected that the sensor 4 has a linearly-increased offset in its reading to all kinds of chemicals. And we tried use linear regression to discover the fitting line of readings along time for each chemical. For example, the fitting line of Applumonia is shown below:

Figure: Line Chart of Sensor 4 of Applumonia versus time with fitting line.

 

Due to the lack of the relevant information, we take the assumption that at the beginning of the data (Apr. 1st), the offset was zero and just started to increase.

To handle this error, we let the interception of the fitting line to be 0 and minus it from the original reading. Then we can get a steady reading behavior of sensor 4.

 

Pattern 5: Larger Reading of Sensor 3

Figure: Reading Overview for Sensors

 

As we can see from the figure, the reading of sensor 3 is the biggest among all the sensors and for all chemicals. Considering its position (not so close to the companies as that of the sensor 6), it should not have so large sums of readings. So, we think that the reading scale of sensor 3 is problematic. Because this effect do not affect the chemical source discovery in the following question, so the data is left unchanged.

 

Pattern 6: Large Reading of Sensor 7 on a Certain Wind Direction

Figure: Polar Plot of Sensor 7 for Different Chemical

           

As we can see from the figure, for different chemicals, sensor 7 all present a large average reading when the wind come from a specific direction. In the further analysis, this pattern result from several abnormal data in some dates in December 2016.

Figure: Polar Plot of Sensor 7 for Different Chemical

 

As we can see in the figure, there was a single huge reading point at 04:00 on 2016-12-05 for all kinds of chemical, which can be one of the readings that caused this pattern. The underlying reason for this is unknown.

The data is left unchanged, but in the following questions, the polar bar plot of sensor 7 will be partially ignored.

 

MC2.2 – Now turn your attention to the chemicals themselves.  Which chemicals are being detected by the sensor group?  What patterns of chemical releases do you see, as being reported in the data?

Limit your response to no more than 6 images and 500 words.

Figure: Pie Charts of  4 Chemicals for Different Sensors

 

 

This graph is used to visualize the percentage of summed values detected by each sensor for chemicals. From the graph, it is easy to find that in the pie chart of each chemical, there exist all kinds of color, which means all sensors all detect different amount of these 4 chemicals.

 

Through analyzing the data, some patterns are observed and listed below:

 

Pattern 1: Growth Trend

Figure: Line Chart of  4 Chemicals’ Release Trend

 

This graph displays the total amount of each chemicals released in these 3 months.

From this graph, it is obvious that the discharge amount of each chemical all has the ascend tendency. For chemical Appluimonia and Chlorodine, there is almost a linear growth from April to December. The amount of Appluimonia and Chlorodine released are always less than those of AGOC-3A and Methylosolene. The amount of AGOC-3A ranked first all the time, followed by the amount of released Methylosolene. The release of AGOC-3A experience a relatively dramatic rise from April to August while the speed of growth reduces a lot from the August to December. On the contrast, the amount of Methylosolene rises slowly from April to August but becomes faster from August to December.

 

Pattern 2: Dramatic Fluctuation of volatile organic solvents

Figure: Line Charts of  4 Chemicals’ Daily Release

 

This graph illustrates the detailed daily release of 4 chemicals. The release of Appluimonia and Chlorodine stay stable in each day. The daily sums of these two chemicals are not high. Compared with these two chemicals, the daily release of VOC related chemicals changes more dramatically. The release of Methylosolene often starts to vary after the middle. In August and December, the readings about Methylosolene experienced a sudden rise and quickly fell to usual value. Contrast to it, the huge fluctuation of its release is usually observed in the beginning and middle of that month and its release become calm in the end of month. This may be attributed with the fact that AGOC-3A and Methylosolene are substitutions of each other.

 

Pattern 3: time pattern of Methylosolene’s release


Figure: Heatmap of  Methylosolene’s all time release

 

This graph is the punch card about the release of Methylosolene. Using 5 as threshold, we calculate the number of reading over the threshold. The larger the size of each point is, the more readings over threshold are recorded. It is easier to find that this kind of chemicals was usually released more in the evening than that in the daytime. From the 6 am to 8pm, this chemical is almost not released while the counts of readings over threshold were usually large in 22 pm. However, this kind of pattern is hard to be found for other chemicals.

 

MC2.3Which factories are responsible for which chemical releases? Carefully describe how you determined this using all the data you have available. For the factories you identified, describe any observed patterns of operation revealed in the data.

Limit your response to no more than 8 images and 1000 words.

All of the following analysis is based on the error-handled data mentioned in Question 1.

The readings of each sensor are affected by wind. So to determine the contribution of each factories, the wind data must be considered also. To better visualize the sensors’ data after corrected by wind information, we developed the below system. Each circle is in the relative position of corresponding sensor and is a polar coordinator. The height of each bar in the polar coordinator stands for the averaged reading in such wind direction. The color of bar is also related with the averaged value: bar with red color has larger value while bar with green stands for smaller value. Using this system, it is easy to distinguish the direction with highest reading of each sensor. If the highest bars of majority of sensors point to one company, it is highly possible this company is the main source of this kind of chemical.

 

Appluimonia:

The following figure shows the average reading of the different direction of each sensor of all given records.

 

Figure: Orientation Distribution of Appluimonia of all time.

 

            This graph shows the sensor readings about the Appluimonia. From the figure, we can clearly see highest bar of most sensors point to the company called Indigo Sol Boards. Especially, one bar with red color in sensor 6, the sensor located in the center of companies’ region, directs obviously to this company, which mean most of large reading came from this direction.

            In conclusion, Indigo Sol Boards is the most possible source of the Appluimonia.

 

Chlorodinine:


 Figure: Orientation Distribution of Chlorodinine of all time.

 

            This graph shows the sensor readings about the Chlorodinine. From the figure, it is easily found that the directions which large readings came from of sensor1, sensor3, senor 4, senso6, snesor8 intersect at the location of Roadrunner Fitness Electronics. This means Chlorodinine were mainly released by the company of Roadrunner Fitness Electronics.

 

Methylosmolene:

       
Figure: Orientation Distribution of Methylosmolene of all time.

 

            This graph shows the sensor readings about the Methylosmolene. In this figure, the intersection of high bars of all sensors is located in the place of Kasioc Office Furniture. What’s more, according to the analysis in the Question 1, the change of the reading of Methylosmolene is larger and more sudden than others, and in this figure, it is reflected by a few very huge bars overwhelming others. Although there is possibility that the sensor are more sensitive to this chemical, the high level of emission of Methylosmolene is still reflected.

So clearly, these large readings all came from Kasios Office Furniture.

 

AGOC-3A:

            Before the error handling method applying to the dataset, originally, the reading distribution is:

https://lh6.googleusercontent.com/fpuEs5oQ-j0yy5VJjcaXzF36ZZb2l9-rvDa_B6LJMKJHN6rIZz6Nxbn7kC_WZFcKj9JIes1nB9PR4Zd-uZNxZzpFNmsyTAZDXINLCr-2BWU-vGIYkPGRtUodE5tE6kRcjMKGrV-i


 Figure: Orientation Distribution of AGOC-3A of all time before error elimination.

 

            But after handling the errors mentioned in Question 1 Pattern 2:


            Figure: Orientation Distribution of AGOC-3A of all time after error elimination.

            It is very obvious that the suspicious contribution from the company of Kasios Office Furniture is eliminated by the error handling method mentioned in Question 1 Pattern 2 (when the repeated reading occurred, eliminate the larger one, and keep the smaller one). And only the company of Radiance ColourTek is pointed to in the second figure.

            But It is very notable that only the release from Kasios Office Furniture would cause such error. Considering the release of Methylosmolene by Kasios and the possible reaction between these two chemicals mentioned in the answer of Question 1 Pattern 2, we suspect that it is just the high-level release of both AGOC-3A and Methylosmolene from Kasios that led to such contribution elimination. So, we still consider Kasios Office Furniture as major source of AGOC-3A and Radiance ColourTekas the secondary source.

 

Conclusion:

            The contribution of each company to each chemicals is shown in the following table:

 

Appluimonia

Chlorodinine

AGOC-3A

Methylosmolene

Indigo Sol Boards

1

 

 

 

Roadrunner Fitness Electronics

 

1

 

 

Radiance ColourTek

 

 

1

 

Kasios Office Furniture

 

 

1

1